Systematic assessment of imputation performance using the 1000 Genomes reference panels

نویسندگان

  • Qian Liu
  • Elizabeth T. Cirulli
  • Yujun Han
  • Song Yao
  • Song Liu
  • Qianqian Zhu
چکیده

Genotype imputation has been widely adopted in the postgenome-wide association studies (GWAS) era. Owing to its ability to accurately predict the genotypes of untyped variants, imputation greatly boosts variant density, allowing fine-mapping studies of GWAS loci and large-scale meta-analysis across different genotyping arrays. By leveraging genotype data from 90 whole-genome deeply sequenced individuals as the evaluation benchmark and the 1000 Genomes Project data as reference panels, we systematically examined four important issues related to genotype imputation practice. First, in a study of imputation accuracy, we found that IMPUTE2 and minimac have the best imputation performance among the three popular imputing software evaluated and that using a multi-population reference panel is beneficial. Second, the optimal imputation quality cutoff for removing poorly imputed variants varies according to the software used. Third, the major contributing factors to consistently poor imputation are low variant heterozygosity, high sequence similarity to other genomic regions, high GC content, segmental duplication and being far from genotyping markers. Lastly, in an evaluation of the imputability of all known GWAS regions, we found that GWAS loci associated with hematological measurements and immune system diseases are harder to impute, as compared with other human traits. Recommendations made based on the above findings may provide practical guidance for imputation exercise in future genetic studies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessment of Genotype Imputation Performance Using 1000 Genomes in African American Studies

Genotype imputation, used in genome-wide association studies to expand coverage of single nucleotide polymorphisms (SNPs), has performed poorly in African Americans compared to less admixed populations. Overall, imputation has typically relied on HapMap reference haplotype panels from Africans (YRI), European Americans (CEU), and Asians (CHB/JPT). The 1000 Genomes project offers a wider range o...

متن کامل

Performance of Genotype Imputation for Low Frequency and Rare Variants from the 1000 Genomes

Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) < 5%) are not systemically assessed. With the emergence of next-generation sequencing, large reference panels (such as the 1000 Genom...

متن کامل

Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels

Genotype imputation is a vital tool in genome-wide association studies (GWAS) and meta-analyses of multiple GWAS results. Imputation enables researchers to increase genomic coverage and to pool data generated using different genotyping platforms. HapMap samples are often employed as the reference panel. More recently, the 1000 Genomes Project resource is becoming the primary source for referenc...

متن کامل

A combined reference panel from the 1000 Genomes and UK10K projects improved rare variant imputation in European and Chinese samples

Imputation using the 1000 Genomes haplotype reference panel has been widely adapted to estimate genotypes in genome wide association studies. To evaluate imputation quality with a relatively larger reference panel and a reference panel composed of different ethnic populations, we conducted imputations in the Framingham Heart Study and the North Chinese Study using a combined reference panel fro...

متن کامل

Choosing Subsamples for Sequencing Studies by Minimizing the Average Distance to the Closest Leaf

Imputation of genotypes in a study sample can make use of sequenced or densely genotyped external reference panels consisting of individuals that are not from the study sample. It also can employ internal reference panels, incorporating a subset of individuals from the study sample itself. Internal panels offer an advantage over external panels because they can reduce imputation errors arising ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Briefings in bioinformatics

دوره 16 4  شماره 

صفحات  -

تاریخ انتشار 2015